Large-Scale Multi-Label Learning with Incomplete Label Assignments
نویسندگان
چکیده
Multi-label learning deals with the classification problems where each instance can be assigned with multiple labels simultaneously. Conventional multi-label learning approaches mainly focus on exploiting label correlations. It is usually assumed, explicitly or implicitly, that the label sets for training instances are fully labeled without any missing labels. However, in many real-world multi-label datasets, the label assignments for training instances can be incomplete. Some groundtruth labels can be missed by the labeler from the label set. This problem is especially typical when the number instances is very large, and the labeling cost is very high, which makes it almost impossible to get a fully labeled training set. In this paper, we study the problem of large-scale multi-label learning with incomplete label assignments. We propose an approach, called Mpu, based upon positive and unlabeled stochastic gradient descent and stacked models. Unlike prior works, our method can effectively and efficiently consider missing labels and label correlations simultaneously, and is very scalable, that has linear time complexities over the size of the data. Extensive experiments on two real-world multi-label datasets show that our Mpu model consistently outperform other commonly-used baselines.
منابع مشابه
An Efficient Large-scale Semi-supervised Multi-label Classifier Capable of Handling Missing labels
Multi-label classification has received considerable interest in recent years. Multi-label classifiers have to address many problems including: handling large-scale datasets with many instances and a large set of labels, compensating missing label assignments in the training set, considering correlations between labels, as well as exploiting unlabeled data to improve prediction performance. To ...
متن کاملEnhancing multi-label classification by modeling dependencies among labels
In this paper, we propose a novel framework for multi-label classification, which directly models the dependencies among labels using a Bayesian network. Each node of the Bayesian network represents a label, and the links and conditional probabilities capture the probabilistic dependencies among multiple labels. We employ our Bayesian network structure learning method, which guarantees to find ...
متن کاملMLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection
Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...
متن کاملSemi-Supervised Multi-Label Learning with Incomplete Labels
The problem of incomplete labels is frequently encountered in many application domains where the training labels are obtained via crowd-sourcing. The label incompleteness significantly increases the difficulty of acquiring accurate multi-label prediction models. In this paper, we propose a novel semi-supervised multi-label method that integrates low-rank label matrix recovery into the manifold ...
متن کاملMultiple Kernel and Multi-label Learning for Image Categorization
MULTIPLE KERNEL AND MULTI-LABEL LEARNING FOR IMAGE CATEGORIZATION By Serhat Selçuk Bucak One crucial step in recovering useful information from large image collections is image categorization. The goal of image categorization is to find the relevant labels for a given image from a closed set of labels. Despite the huge interest and significant contributions by the research community, there rema...
متن کامل